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Abstract. The generic identification problem is to decide whether a stochastic process (Xt) is a 
hidden Markov process and if yes to infer its parameters for all but a subset of parametrizations 
which form a lower-dimensional subvariety in parameter space. So far partial answers to either the 
decision or the inference part have been given all of which depend on extra assumptions on the 
processes such as stationarity. Here we present a general solution for binary-valued hidden Markov 
processes. Our approach is rooted in algebraic statistics hence is geometric in nature. We find 
that the algebraic varieties associated with the probability distributions of binary-valued hidden 
Markov processes are zero sets of determinantal equations which draws a connection to well-studied 
objects from algebra. As a consequence, our solution allows for algorithmic implementation based 
on elementary (linear) algebraic routines. 
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1. Introduction 

Hidden Markov processes (HMPs) have gained widespread interest in statistics, pre- 
dominantly due to their striking successes in applications. Central theoretical concerns 
have revolved around the fundamental problems of identifiability and complete identifi- 
cation. Here and in the following, stochastic processes (Xt) take values in a finite set 
{alphabet) T, where binary-valued refers to the case |S| = 2. 

Problem 1.1 (Complete Identification). Decide whether a stochastic process (Xt) is a 
hidden Markov process and if yes, infer its parameters. 

The problem was raised already in the late 50s and early 60s. A representative list 
of references is [6, 12-14, 19, 23, 25, 33], see also the more recent contributions [2, 3, 
22, 38] and the exhaustive list of references in [18]. See also [3, 4] for HMM parameter 
estimation from data and [9] for a textbook on related practical issues. In terms of practical 
arguments one can argue that it is reasonable to solve problem 1.1 for all but a null set of 
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parametrizations which explains that also the most recent contributions [22, 38] provide 
generic solutions. That is, solutions apply for all, but a subset of parametrizations which 
form a lower-dimensional subvariety in parameter space. 

The above-mentioned treatments usually raise extra assumptions on the processes, 
often centered around stationarity. The only exception is Heller who provided a polyhedral 
cone-based characterization of arbitrary, also non-stationary HMPs [27] which, however, 
was exposed as a reformulation rather than a solution [2] in the sense of not giving rise to 
an algorithmic solution of problem 1.1. To date, one can consider problem 1.1 to not have 
been fully resolved. 

The fact that one can assign every probability distribution P : — >■ [0, 1] over finite- 
length strings to a HMP on \Ti\'^ states (which is a well-known exercise, the hidden states of 
the HMP form a de Bruijn graph over S*^, together with the obvious transition probabili- 
ties), introduces further complications when aiming at algorithmic solutions. We therefore 
turn our attention to the following finite reformulation of problem 1.1. 

Problem 1.2 (Finite Identification). Let P : — )• [0, 1] be a probability distribution 
over strings of finite length n. Decide whether P is due to a HMP on d hidden states and 
if yes, infer its parameters. 

In the course of this paper, we provide a generic solution to problem 1.2 for binary- 
valued alphabets in case oi d < Our solution is rooted in algebraic statistics where 
we draw in particular from the concept of an algebraic statistical model, as described in 
[16, 31]. See for example [24, 37] for discussions on Bayesian networks, which, as latent 
variable models, are related with hidden Markov models. Since, as is well-known [32], 
HMPs are uniquely determined by their distributions over strings of length 2d — 1, a 
solution of problem 1.2 also gives rise to a solution of the original problem 1.1: 

1. For each n € N determine d{n) as the minimal number of hidden states such that 
the answer in the 'Decision' part of problem 1.2 is 'Yes'. In case that there is no 
d < ^ set d{n) := oo. 

2. If d{n),n S N converges, output 'Yes' and infer corresponding parameters. If not, 
output 'No'. 

Note that a process (Xt) is infinite input. Hence an infinite solution is all one can 
expect. Overall, a generic, algorithmic solution of problem 1.2 hence of problem 1.1 for 
binary-valued processes without raising other assumptions on the processes has not been 
presented in the literature before. 

We denote the set of parameters of HMPs with d hidden states by 'Hd,+- By (3.6) below, 
'Hd,+ is a full-dimensional subset of the positive orthant of real affine space M'^^"^"'"^. In 
form of a theorem, our solution to problem 1.2 reads as follows. 

Theorem 1.3. Let |S| = 2, d < and P : S" ^> [0,1] he a probability distribution. 
There is a an algebraic variety Md C such that dimA/rf < d^ + d — 1 and an 
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algorithmic routine A which, when given P as input, outputs 

' 'BMP on d hidden states' P G f„,d(('Hd,+ \ Md) 
A(P) = { 'Cannot decide' P € f„,d(A/'d) 

Wo HMP on d hidden states' else 

In the first case, A also outputs the parametrization, which is unique up to permutation of 
hidden states. 

In the course of collecting related arguments, we provide an ideal-theoretic charac- 
terization of the varieties associated with finitary processes with arbitrary output alpha- 
bets and, based on dimension arguments, point out that the varieties of finitary and 
hidden Markov processes coincide for binary alphabets. Relationships between finitary 
processes and HMPs have been noted already in seminal work on identification of HMPs 
(e.g. [6, 12, 13, 25, 27]). Here we review them from the point of view of algebraic statis- 
tics. Corresponding results are summarized into the ideal-theoretic theorem 6.7, which is 
based on the set-theoretic lemma 6.8. Note that the ideals we encounter are determinantal 
in nature; corresponding relationships for latent variable models have also been noted in 
[7, 11, 39]. 

Organization of Chapters In section 2, we give the basic definition of an algebraic 
statistical model and also the definition of an algebraic process model, which serves the 
general purpose to treat stochastic processes in algebraic statistical settings. In section 
3, we give formal definitions of finitary and hidden Markov processes. In section 4, we 
give the definitions of their algebraic statistical counterparts, the finitary and the hidden 
Markov process model. Along with these definitions, we provide a brief list of fundamental 
relationships. In section 5 we compute the dimensions of the algebraic varieties associated 
with finitary and hidden Markov process models. A crucial observation drawn in this 
section is that the varieties of both models coincide for binary-valued output alphabets. 
In section 6 we provide a Hankel-matrix-based characterization of finitary models hence 
also of binary-valued hidden Markov models, the ideal-theoretic formulation of which is 
documented as the major theorem 6.7. In section 7 we present the algorithm on which 
theorem 1.3 from above is based. 



Major Notations We denote by S* := Ut>oS* the set of all strings over the alphabet 
S where TP = {e} with e the empty string. We write v,w for elements of S* and vw for 
their concatenation. Throughout this paper, we write 

px{v = ai...a„,) := P({^i = oi, = a„}) (1.1) 

for the probability that the stochastic process {Xt) generates the string w € S" (for 
technical convenience we let stochastic processes start at t = 1) and we simply write 
p = px if this cannot lead to confusion. We write ' for matrix transposition throughout. 
Note finally that none of our algebraic arguments exceed an elementary level, see [10] for 
an appropriate textbook. 
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2. Algebraic Statistical Models 

Definition 2.1. Following [31], an algebraic statistical model with m parameters for 
strings of length n over an alphabet S is a map 

f: — > C'^'" 

Z = f(z) = (/„(zi,...,Zm))i)GS" 

where G C[Zi, Z^], f € are polynomials in the indeterminates and 
there is a parameter set 5 C C™" (usually 5 C M"*) such that for z G 5 

p,: ^ [0,1] 

^ A-(z) ^'-'^ 

is a probability distribution and such that is the natural extension of the parameter 
set 5 to a complex affine space. 

For the following explanations, we recall that varieties V C C" correspond to radical 
ideals / C C[Xi, X„] insofar as V is the set of zeros of all polynomials in / [10]. We also 
recall that an ideal I is prime iff € / implies x S / or y E J and that in terms of the 
above-mentioned correspondence prime ideals have irreducible varieties as counterparts. 
It is a well-known fact (e.g. [31, Th. 3.14]) that f(C"^), as the image of a complex-valued 
polynomial map is a Boolean combination of varieties. In particular, its topological closure 
Vf = f (C") is an irreducible algebraic variety in C'^'" which corresponds to the prime ideal 
/f C C[p„ I f G S"] where we write for indeterminates to stress that they are associated 
with probability distributions over strings v E S". We will write P or [p[v))^<^Y.n for 
the points in complex affine space C^". Polynomials g & h are referred to as (model) 
invariants and the goal of an algebraic statistical treatment usually is to characterize or 
even explicitly list these invariants. See [15, 16, 31] for related textbooks 



2.1. Algebraic Stochastic Process Models 

When dealing with stochastic processes (Xt), the auxiliary, helpful observation is that 

pxiai...am) = ^ pxiai.-.ambi... ). (2.2) 

6i...6„_„GS"-'" 

As a consequence, one can make use of indeterminates pu for strings u of length m shorter 
than n when examining polynomial relationships in C\py,v G S"]. The computation 

Pu= ^ Puw (2.3) 

reveals them as polynomials in the Pv,v € S" such that there is no elimination necessary. 
That relationship for stochastic processes is crucial for this work. We emphasize this with 
a definition. 



A. Schonhuth : Generic Identification of Binary- Valued Hidden Markov Processes 



5 



Definition 2.2 (Algebraic Stochastic Process Model). A family of algebraic statistical 
models 

(f„ : C^")nm (2.4) 

is called algebraic (stochastic) process model if for all 1 < m < n and u S S™: 

2.2. Note on Stationarity 

A process (Xt) which takes values in S is stationary if and only if 

^px(ai---an-ia) = ^ (««!•• -On-i) (2.6) 

for all V = ai...a„-i € S""-*^. Let ^ be a class of stochastic processes which gives rise to 
the process model (fA',n)neN- When studying the variety 

Vf.^. = V{h,J (2.7) 

associated with the string length n probability distributions, the stationary distributions 
among them are associated with the subvariety [{fj,j G J) denotes the ideal generated by 
polynomials fj,j G J and + is for addition of ideals] 

which, unless the processes X are stationary by definition establishes that stationary pro- 
cesses form a lower-dimensional subvariety in Vf^ ^ . 

As pointed out in the Introduction, stationarity is a ubiquitous assumption in all major 
previous work. While the extent to which earlier treatments depend on it remains unclear , 
stationarity has geometric implications: by (2.8), stationary HMPs only form a null set 
among all HMPs. See also remark 6.6 later in the text. 

In practical applications, it is much more often than not essential to assume that 
processes are not stationary. This becomes evident in particular in application domains 
where HMPs or their close derivatives have established "gold standards", for example 
speech recognition [34], protein classification (through profile HMMs) [17], gene finding 
[8] and gene expression time-course analysis [35] . Therefore a general treatment of HMP 
identification is certainly desirable. 



^For example, the precise extent to which [38] depends on it is difficult to determine, [22] base their approach 
on KuUback-Leibler divergence computations, which is possible only in case of stationary processes 
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3. Processes 

3.1. Finitary Processes 

Finitary processes emerged in the above-mentioned early work on HMP identification 
[6, 12-14, 25, 27] and have remained a core concept also in recent work on identifiability [22, 
38]. Finitary processes were later also referred to as linearly dependent [28], observable 
operator models [29] or as finite- dimensional [20, 36]. In their possibly most prevalent 
application they served to determine equivalence of hidden Markov processes (HMPs) in 
1992 [28] whose exponential runtime algorithm was later improved to a polynomial runtime 
solution [21]. 

Definition 3.1 (Finitary Process). A stochastic process [Xt) is said to be finitary iff 
there are matrices Ta G W^^'^ for aU a G S with (EaeE^a)l = 1 (t^^at is (EaGE ^a) ^as 
unit row sums) and a vector vr € M"^ whose entries sum up to one (vr'l = 1) such that 

P({Xi = ai, ...,Xn = On}) = n'Ta, ■■■■■Ta^l (3.1) 

where 1 = (1,...,1)' € is the vector of all ones. The parametrization {{Ta)aeSjx) is 
referred to as d- dimensional in case of vr G M'^ and Ta G R"^""^ for all a G S. 

It is an immediate observation that a finitary process which admits a d-dimensional 
parametrization also admits a parametrization of dimension d+1. Therefore the following 
definition makes sense. 

Definition 3.2 (Rank of a Finitary Process). The rank of a finitary process (Xt) is the 
minimal dimension of a parametrization that it admits. 

We conclude by providing a condition which is necessary for rank d finitary processes. 
For further reference, we use the notation 

n := Ta.Ta, . . . Ta,^^,Ta„ G R'''"' (3.2) 

for any f = ai . . . a„, G S*^. 

Proposition 3.3. Let (Xt) be a finitary process of rank d and let n N be an arbitrary 
integer. Then it holds that 

rk [pxiviWj)]i<ij<n < d (3.3) 
for all choices of strings vi, ...,Vn,wi, Wn G S*. 

Proof. Let ((Ta)agS) tt) be a d-dimensional parametrization of {Xt). We observe that 

Px{viWj) = {tt'T^^.Tu,^!). (3.4) 
Since ttT^. G M^^'^,T^„^.1 G W^^^ the claim becomes obvious. □ 
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3.2. Hidden Markov Processes 

Definition 3.4 (Hidden Markov process). A hidden Markov process (HMP) {Xt) on d 
hidden states [we write s, s € {1, d} or, if more convenient and does not lead to confusion 
with other indices, i, j for hidden states] which takes values in S is parametrized by a tuple 
e = (M, E, vr) where 

1. M = [niss] € W^^'^ is a non- negative transition probability matrix with unit row 
sums X^j^i russ = 1 (i.e. the row vectors of M are probability distributions over the 
hidden states) 

2. E = [csa] G M'^^^ is a non-negative emission probability matrix with unit row sums 
Saes ^"a = 1) {i-^- the row vectors of E are probability distributions over S) 

3. vr is an initial probability distribution over the hidden states 
We write 

na,+ := {(M, E,7:)\Y, mss = J] e,, = ^ vr, = 1} C mJ+'^(I^1)+'^ (3.5) 

for the set of HMP parametrizations. We refer to 'Hd,+ the stochastic parametrizations. 

The naming stochastic parametrizations is to distinguish them from more relaxed, 
complex-valued parameter sets whose definition will follow. Note that 

d\m'Hd,+ = d{d - 1) + d(|S| -l) + {d-l) = d^ + (i(|S| - 1) - 1 (3.6) 

which means that T-id.-y- can be considered a full-dimensional subset of A 
HMP {Xt) on d hidden states as parametrized by (M, E, vr) proceeds by initially moving to 
a state s G {1, d} with probability vr^ and emitting the symbol Xi = a with probability 
Csa- Then one moves from s to a state s with probability rUgs and emits the symbol X2 = b 
with probability e^b and so on. 

We further observe that M decomposes as M = 'Ylia&T, where 

which reflect the probabilities to emit symbol a from state s and subsequently to move on 
to state s and we use the notation 

Oa ■■= diag (eia, eda) such that Ta = OaM. (3.8) 

Consequently, we also write Q = (Af, {Oa)aeT,,'^) for HMP parametrizations. In analogy 
to finitary process notation, we furthermore write 

n := Ta.Ta, . . . Ta„_,Ta„ = Oa,MOa,M . . . Oa„_,MOa„M G R'^^'^ (3.9) 
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for any f = ai . . . o„ € S". Standard technical computations (see remark 3.5 below) then 
reveal that, for v = ai...a„, G S" 

p{v = ai...a„) = 7r'Taj^...Ta„l = vrXl, (3.10) 

where 1 = (1, 1)' € M.'^ is the vector of all ones. 

Remark 3.5. Computation of vectors vr'T^ G M^^*^ and T^l G R'^^^ reflects the well- 
known Forward and Backward algorithms {e.g. [18]) for computation of HMP probabilities 
since entries of these vectors are just the Forward and Backward variables, that is 

(7rX)s = Pr(5'„+i = s \ Xi = ai, = a„) (3.11) 

{Tyl)s = Pr(5t = s I Xt+i = ai,...,Xt+n = an) (3.12) 

where (St) is the (non-observable) Markov process which takes values in the hidden states 
{l,...,d}. 

(3.10) makes it obvious that a HMP on d hidden states is a finitary processes which 
admits a d-dimensional parametrization. This allows the following definition. 

Definition 3.6 (Rank of a Hidden Markov Process). The rank of a hidden Markov process 
{Xf) is its rank as a finitary process. 

The definition gives rise to the following trivial proposition. 

Proposition 3.7. A hidden Markov process acting on d hidden states is a finitary process 
of rank at most d. 

Example 3.8. As examples for parametrizations of hidden Markov processes on d hidden 
states with rank d, for arbitrary alphabets S of size |S| > 2, let a G S be a letter and 
Ai, Arf G (0, 1) be pairwise different. Consider HMP parametrizations {M,E,tt) where 

M = Id gM'^^'^, 7r= (i,...,i) = il gM*^ (3.13) 

and there exists a G S such that 

Oa = diag (Ai := eia, A^ := Cda) (3.14) 

where all Xi,l < i < d are pairwise different. The Ob = diag (ei^, edb),b G S \ {a} can 
be chosen arbitrarily. Observe that 

/I ... 1 \ 

Ai ... Xd 



S{X) := (l'ld,...,l'0^ 



d~i\ 



[^\<^,j<d e M'^"'' (3.15) 



forms a Vandermonde matrix hence is invertible. It follows that [we write a* := a... a G S*] 

\p{a'-'a^~')]i<ij<d = [i • l'0l'0i'HU,^,<d = i • S{X)S{Xy G M'^^'^ (3.16) 

is an invertible matrix. By proposition 3.3, the hidden Markov process with parametriza- 
tion (M, (Oa)aGE;7r) has rank d. 
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4. Models 

4.1. Finitary Models 

Finitary models are the algebraic statistical equivalent of finitary processes. 
Definition 4.1. Finitary models ^Md,n ^'^'s the polynomial maps 

^M..n- M- 



Md,n ■ J^^d ^ 

((7'a)aes),7r) 1-^ (7r'rj,l)j,gsn. 



where 



(4.1) 

Md := {((T,),eE), vr) G Cl^l'^'+'^ | ^ T.l = 1} = d^''^' (4.2) 



We write 



yMi,n ■■= Im f7Kd,n 

for the variety which is associated with fM^.n and 

for the ideal of its invariants. Unlike in the definition of finitary processes, we do not require 
that vr'l = 1 which would translate to adding the inhomogeneous invariant '}2vP^ = 1 to 

The relationship X^^TqI = 1 yields that the family {fMd,n)n£N is an algebraic process 
model. 

Proposition 4.2. The family {i_M^^n)nGN is an algebraic process model. 
Proof. Let v S S™. Writing M := Ta we observe 

fAl„m(^). = vrXl ^ 7rXM"-"^l = ^MdA^U- (4.3) 

□ 

By the definition of finitary process models one can further register: 
Proposition 4.3. For all d,n G N: 

Im {Md.n C Im iMd+i,n- (4.4) 

Proof. This is due to that one can extend d-dimensional matrices by zero entries to 
obtain a d + 1-dimensional parametrization and refiects that every finitary process with a 
(i-dimensional parametrization also admits a d + 1-dimensional parametrization. □ 
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4.2. Hidden Markov Models 

We obtain an algebraic statistical treatment of HMPs by allowing that parameters in 
M, E and vr are complex and we write 

n 

Ud ■■= {(M, E, vr) G C'^'^d\n+d I ^ ^. . = 1, ^ = 1} = C'^'+'^d^l-i) (4.5) 

for the resulting set of parameters. Note that we still require unit row sums in both M 
and E while we do not make any such assumption for vr. The unit row sum assumption 
for E implies that still 

M = ^ Ta where {Ta)ss = CsaW-si (4.6) 

while the unit row sum assumption on M implies that Ml = 1 hence (let v € S"^ and 

m < n) 

p{v) = 7r%l = vr'r^M'"! = ^ p{vu) (4.7) 
a relationship which holds for stochastic processes in general. Note that 

dimnd = + dim - I) ^ dim'Hd,+ + 1 (4.8) 

which is explained by that in dimTid we do not require that the entries of vr sum up to 
one — ^just as in case of finitary models we avoid the introduction of the non-homogeneous 
invariant '^^,Pv = 1 for technical convenience. 

Definition 4.4. We call 

@ = {M,E,7r) = {{Ta = OaM)ae^,7r) ^ (7rXl).eS'^ 

a hidden Markov model for d hidden states and string length n. 

The relationship (4.7) yields further: 

Proposition 4.5. The family (f^^^„)„gN of hidden Markov models for d hidden states is 
an algebraic process model. 

We write 



Vu^n ■■= in,A^d) (4.10) 
for the algebraic variety that is associated with iud,n and 

for the ideal of its invariants. 
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Proposition 4.6. For all d,n N; 

(a) Im fna,n C Im tna+T_,n 
(h) Im f-H^.n C Im iMd,n- 

(c) Vna,n C VMa,n- 

(d) lMd,n C lHd,n- 

While (a) reflects that HMPs on d+l hidden states encompass the HMPs on d hidden 
states, (d) translates to that each invariant of a finitary model applies for the corresponding 
hidden Markov model, (d) is a key observation for this work. 

Proof, (a) is due to that one can extend matrices by zero entries thereby obtaining 
higher-dimensional parametrizations, (b) is obvious by the deflnitions of hidden Markov 
and finitary process models while (c) immediately follows from (b). (c) and (d) finally are 
equivalent, due to elementary algebraic geometric arguments [10]. □ 



5. Dimension 

5.1. Finitary Models 

In this section we compute the dimension of the variety VMd,n for n > 2d — 1. The 
key insight to this computation is the following lemma. 

Lemma 5.1. Let n > 2d — 1 and let Q := {{Ta)a£S, x),Q := {{Ta)aeT:,x) € ^Ad be two 
parameterizations giving rise to finitary processes. Consider the following two statements: 

(i) 

f^„„(G) = fA4„n(e) (5.1) 
(a) There exists an invertible linear map 5 : — ?• such that 

51 = 1, x' = x'S and Va € S : f „ = S~^TaS (5.2) 

Then (ii) implies (i) and the two statements are equivalent if 

fMd,n(6), fA4d,n(€)) Im fMd_i,n (5.3) 
which means that 0, give rise to finitary processes of rank d. 

Proof. While (ii) (i) is obvious, (i) =^ (ii) is a straightforward generalization of 
statements presented in previous works (e.g. [28, 29]) to complex- valued parameters 
0,0. □ 

Lemma 5.1 enables application of a well-known theorem [26, Th. 11.12] for computing 
dimensions of varieties. 
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Theorem 5.2. Let ^Md,n os in definition 4-1 such that n > 2d — 1. Then 

Proof. The case |S| = 1 is trivial: Im iM„,d = for all 7i, d. hi case |S| > 2, we note 
that stochastic processes of rank d exist (see example 3.8 above), that is there is 

G such that fM^,„(9) Im fM^_j,„. (5.5) 

Lemma 5.1 states that ^Md,n{Q) = ^Md,n{Q) if and only if there is an invertible linear 
map S € C^^*^ with 51 = 1 which transforms G into Q as described in the lemma. This 
translates to that the fiber n(®) dimension equal to that of the space of invertible 
linear maps S with SI = 1 which is d{d — 1). In case that f^^^„(Q) € Im fA4^_-^,„(Q), 
lemma 5.1 states that the existence of invertible linear maps S with SI = 1 which trans- 
form G into another point Q € iMd,n{Q)~^ is only a sufficient condition. This implies 
further that dim fvi^,n(0)"^ > d{d — 1). 

The statement of the theorem is finally obtained by application [26, Th. 11.12]. Due 
to standard arguments [31], the closure of the image of fM^," i^ ^ quasi-projective variety. 
Plugging Aid-,^Md,n here into X, vr in [26, Th. 11.12] there yields 



diuiVM,,n = d\.xniM^^n{Md) = dimXd - d{d - 1) = (|S| - l)d^ + d (5.6) 
as was claimed. □ 



5.2. Hidden Markov Models 
Lemma 5.3. Let 

^d:=f^i„(-Md-i,n)U?^d,o (5.7) 

where Hdfl C Tid encompasses all parametrizations G = {M,E,tt) = (M, {Oa)ae^,'^) such 
that 

• M is not invertible or 

• there is no a (zTi such that the eigenvalues of Oa are pairwise different. 

By definition, i.^^ ^{Md-i,n) CLf^ the HMM parametrizations on d hidden states whose 
rank is less than d. Obviously J\fd forms a variety where 

dimTVd < dim-Hrf = + d(|S| - 1). (5.8) 

Furthermore, for G = (M, E, tt) eV-dX Nd 

card Jf«,,„(G)) = d\<^. (5.9) 
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Proof. Theorems 3.1 and 3.2 in [3] prove this for stationary processes. Our proof 
consists in observing that the stationarity assumption in [3] is not used. Moreover, it is 
straightforward to replace real values by complex values. □ 

Remark 5.4. [1] provide alternative arguments to prove identifiability of stationary HMPs 
which, while one needs stationarity in [1], can be easily extended to non-stationary HMPs 
and also to complex values. Note that [1] particularly focus on generic identifiability of 
HMPs from their distributions over strings of length n < 2d — 1 for alphabets |S| > 2. 
As they do not explicitly name the generic subsets, application of results from [3] is more 
convenient here. 

Corollary 5.5. As real-valued varieties, 

dim {Md n 'Hd,+) < dmind,+ = dim^^rf - 1 (5.10) 

and 

card f^],J%,n(e)) = d\ (5.11) 

foree'Hd,+ \Md. 

Proof. The proof is completely analogous to that for lemma 5.3. The reduction in 
dimension by 1 for T-Ld,+ is due to not requiring 

Ef=i vTi = 1 for e G -Hd, see (3.6). □ 



Remark 5.6 (Identification Algorithm: Workflow). Our algorithm which solves the iden- 
tification problem 1.2 with the probability distribution P : ^ [0, 1] as input proceeds 
in three steps: 

1. Determine whether P S Im f-H<j,n- 

2. If yes, determine O € Hd such that f-^^^„(0) = P. 

3. If G G Tid \ Md determine whether O is real non-negative. 

From this outer perspective, lemma 5.3 and corollary 5.5 are important elements when 
addressing the third, final step. The first two steps can be performed by procedures 
described in the subsequent sections 6 and 7. 

Theorem 5.7. Let ^Hd,n be as in definition 4-4 where n > 2d — 1. Then it holds that 

fl |Sj = 1 , , 

dimVn,n = i . ■ 5.12) 

" \d^ + {\^\-l)d |S|>2 ^ ' 



The proof again is an application of [26, Th.11.12]. 

Proof [Proof of theorem 5.7] Let 9 = ((T^ = OaM)aes,TT) G UdXAfd- (5.9) implies 
that 

dimf^]je) = 0. (5.13) 
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Arguments based on [26, Th. 11.12] and which are analogous to those for proving theo- 
rem 5.2 then further yield 



dimVHa,n = dimlm i-^^^n 

= dimC'^'+d^l-i)'^ - dimf^],^(e) (5.14) 
= d^ + {m-l)d-0. 

□ 



Binary- Valued HMMs In case of a two-letter alphabet S we find 

dim VHrf,„ = -l)d + d^ = d + d^ = d+{m- l)d^ = dimVM^,n- 

Since V^^^n C Vm^^u and both varieties are irreducible, V-^^^n and VMa,n coincide, which 
is a standard conclusion from algebraic geometry [10, Prop. 10, p. 463]. Therefore, we 
obtain the following key insight. 

Corollary 5.8. In case o/ |S| =2 

y-Hd.n = VM,,n. (5.15) 

□ 

In conclusion, it suffices to study Unitary models when addressing generic identification 
of binary- valued HMPs. 

6. Invariants 

Computation of invariants for finitary models is made possible by a Hankel matrix 
based characterization of finitary processes, corollaries of which will also shed light on the 
relationship n > 2d — 1 in the formulation of problem 1.2. 

6.1. The Hankel Matrix 

Definition 6.1. A string function p : S* — )• C such that 

Vu e S* : ^p{va) =p{v) (6.1) 

is called a process function. 

Note that Yla p{va) = p{v) implies XlugS'" p{vu) = p{v) for all m G N which paral- 
lels the definition of a process model. Note further that, by standard arguments, string 
functions p : S* ^ C are associated with stochastic processes if and only if 

Vt; G S* : ^p{va) = p{v), ^p{a) = 1 and p(S*) C [0, 1]. (6.2) 

To omit Y2aP(.^) ~ l'P(^*) [0) 1] ™ definition of process function is for compatibility 
with algebraic process models, see Def. 2.2. 
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:= bM„,^gE*] GC^*''^* (6.3) 

is called the Hankel matrix of p (also called prediction matrix in case of a process 
function p, see e.g. [36]). 



(6.4) 



• We define 

rk p := rk Vp 
to be the rank of the string function p. 

• In case of rk p < oo the string function p is said to be finitary. 

Example 6.3. Let p : S* — ?• C be a string function over the binary alphabet S = {0, 1}. 

A 



(Pie) 


p{0) 


p(l) 


p(00) 


p{01) 


p(10) 


Pill) 


p{0) 


p{00) 


p{01) 


p(OOO) 


p(OOl) 


p(OlO) 


p(Oll) 


p{l) 


p{10) 


p{ll) 


p(lOO) 


p(lOl) 


p(llO) 


p(lll) 


p(00) 


p(OOO) 


p(OOl) 


p(OOOO) 


p(OOOl) 


p(OOlO) 


p(OOll) 


p{01) 


p(OlO) 


p(Oll) 


p(OlOO) 


p(OlOl) 


p(OllO) 


p(Olll) 


p{10) 


p(lOO) 


p(lOl) 


p(lOOO) 


p(lOOl) 


p(lOlO) 


p(lOll) 


Pill) 


p(llO) 


p(lll) 


p(llOO) 


p(llOl) 


p(lllO) 


p(llll) 



V 



then is the Hankel matrix where strings of finite length have been ordered lexicographically. 
See also [22] for examples. 

Example 6.4. The existence of /Oq € [0, 1], a S S such that 

pxiai---an) = Pai ■ ■■■ ■ Pa„ (6.5) 

yields rk Vp^ = 1 which yields in particular that rk Vp^ = 1 in case of iid processes (X^) 
(in fact, this is a characterization of iid processes). 

As an example of px with rank at least d, see example 3.8 where the finite submatrix 
of Vpj^ (there S was {a, b}) 



( Pie) 
pia) 



pia) 
piaa) 



p{a' 



\pia' 



pia 



piaa 



pia'^-^a'^-^ 



-,2d-2 



\ 



)) 



€ [0, 1] 



dxd 



(6.6) 



has rank d. 



In case of a finitary process (X^), the obvious question is whether the rank of iXt) (see 
definition 3.2) is the rank of px as a string function. By generalizing previous work [20] 
one can give an affirmative answer to this question. This establishes that Aid precisely 
contains the parametrizations giving rise to process functions of rank < d. 
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Theorem 6.5. Let p : T,* ^ C be a process function. Then the following conditions are 
equivalent. 

(i) p is finitary of rank at most d. 

(a) There exist vectors x,y € as well as matrices Ta G C"^^*^ for all a € T, such that 
Vu G S* : p{v = ai...an) = x'Ta^...Ta„y and C^Ta)y = y. (6.7) 

(Hi) There exists a vector x ^ as well as matrices Ta € C^^d. all a £ such that 
Vt;GS*: p{v = ai...an) = x'Ta,...Ta„l and (J^Ta)! = 1- (6.8) 

aGS 

where 1 = (1, 1)' G is the vector of all ones. 

Proof, {a) <^ {iii) (where {Hi) trivially implies {ii)) follows from the observation that, 
given an invertible linear map S* : — t- such that SI = y yields 

x'Ta,...Ta^y = x'SS~^Ta„SS'\..SS~'Ta,SS~^y = 5'r,„...r,,l (6.9) 

where Tq- = S~^Ta^S,x = S'x. (Hi) =^ (i) follows from generalizing arguments from 
previous work [29, 36] for the real- valued case to complex- valued string functions. In fact, 
it is immediate to observe that the arguments in fact apply for arbitrary fields k and 
the corresponding process functions p : T,* ^ k. Last, (i) (ii) consists of generalizing 
equivalent statements available for stochastic processes [29, 36] to process functions, where 
the only difference is that not necessarily ^^gj^n p{v) = 1 for process functions which is 
not needed in the proofs in [29, 36]. □ 

Finite Algebraic Relationships In the following, we write 

(|S|"'+l-l) ^, (|S|"+1-1) 
'Pp,ni,n ■■= \p{vw)]\^\<rn,\w\<n ^ C l^l^^ l^l^^ • (6.10) 

for the upper left submatrices of V which refer to prefixes and suffixes of length at most 
m and n. As a starting point for the following note that well-known arguments (e.g. [36, 
lemma 2.4]) show that 

TkVp = ikVp,d-i,d-i. (6.11) 
and further that a process function of rank < d is uniquely determined by the values 

p{v), \v\ = 2d-l (6.12) 

which, in combination with lemma 5.3 implies that d-state hidden Markov processes are 
generically identifiable from their probabilities on strings of length 2d — 1. Note that [1] 
demonstrate that hidden Markov processes, in case of alphabets of size larger than 2, 
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are generically identifiable already from distributions over strings of length smaller than 
2d—l. However, hidden Markov processes are only generically, but not necessarily globally 
uniquely, determined by their distributions on strings of length smaller than 2d — 1. While 
we believe that [l]'s work can be employed, where applicable, to also lower bounds in this 
treatment, 2d — 1 remains the lowest bound for binary-valued processes presented so far. 

Remark 6.6 (Stationarity). Let p represent a stochastic process and let (Vp)v : T,* C 
be the u-row in Vp (that is {Vp)v{w) = p{vw)) resp. iVp)^ : S* ^ C be the ^y-column of 
Vp (that is {Vp)'^{v) = p{vw)). Due to that p is a process, we have 

T.^^pT" = i-PpT (6.13) 

which is a reformulation of the recurring theme (2.3). In case that p is a stationary process, 
(2.6) translates to 

Y.iVp)av = {Vp). (6.14) 

which removes a certain "asymmetry" in Vp. 

We pause for a moment and summarize. Theorem 6.5 states that the finitary processes 
of rank < d are precisely the ones whose process functions give rise to Hankel matrices of 
rank at most d. In terms of polynomial equations, this formally translates to 

ik p < d if and only if det [p(fi';i'j)]«,i,...,-ii)tj+ies*,-ui,.--fd+iGS* = (6.15) 

for all choices of strings vi, u^+i, tfi, ...^Wd+i € S*. In other words p is finitary if and 
only if all {d+l) x (d+l)-minors of its Hankel matrix Vp are zero. This, in turn, translates to 
polyomial relationships in the ring C[py \ v G S*] which has infinitely many indeterminates. 
We, however, are looking for defining polynomial equations in C\py \ v € S"] for n > 2d—l. 

Therefore, note that indeterminates p^ where \v\ < n are not an issue, due to (2.3). 
However, we still have to get rid of all p^ where \v\ > n. We will do that in the next 
subsection. 

6.2. Ideals and Varieties 

Let 

Id+i,n ■■= { detV I FgC^+i^^+i submatrixofPp,LfJ,rfl or Pp,rfl,LtJ ) 

(6.16) 

Jd,n ■■= ( dety I V £ C^""^ suhmatvix of Vp^d-i,d-i ) 

Id+i,n is the ideal of all {d + l)-minors in either or "Ppj^i^^nj whereas Jd^n is 

the ideal of all d-minors in Vp,d~i,d~i- Let rad I be the radical of an ideal / and / : J the 
quotient ideal of / with respect to J. A characterization of the ideal of invariants of the 
finitary model, which, in case of |$]| = 2 agrees with the ideal of invariants of the hidden 
Markov model reads as follows: 
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Theorem 6.7. Let n> 2d -I. Then 

lMd,n = rad Id+l,n '■ Jd,n- 

For |E| = 2, also 

IUd,n = rad Id+l,n ■ Jd,n- (6-17) 

Computations with Bertini [5] confirm that the quotient operation is necessary since 
/3^4 is not prime. However, it remains an open problem whether the radical operation is 
necessary. Macaulay [30] computations reveal that it is not for d = 2,n = 3. Macaulay 
and Bertini computations furthermore confirm our dimension computations. 

The proof of this theorem is based on a set-theoretic lemma which makes use of the 
insights assembled in the earlier chapters. 

Lemma 6.8. Let n > 2d — 1 and {p{v))veT,'^ £ C^". The following statements are 
equivalent: 

{piv))vGJ:^- G Im {Md,n \ Im f>ld-i,n (6.18) 

(ii) 

rk Vp4-i,d-i = rk T'p^Lf J,rfl = ^p.rfl.LfJ = (6-19) 

In case of (6.19), one can choose parameters for (p(f))„gs" by determining an invertible 
submatrix 

V = [p(v^Wj)]i<,,,<d G C^^"^ (6.20) 

from Vp^d-i,d~i ^iT'd setting 

x' := {p{wi),...,p{wd)) (6.21) 

y:=V-'\ : (6.22) 
\p{vd)J 

Ta := V-^Wa := V-^[p{v,awj)]i<,j<d (6.23) 

which yields that p{v = vi...Vn) = x'T„^...T„„y so that we obtain 

p{v) = 7rTl,...n^l (6.24) 

by further application of theorem 6.5. Note that probabilities in Wa may refer to strings 
Viawj of length up to 2d — 1 which explains the necessity of the assumption n > 2d — 1. 

Note that n > 2d — 1 implies that d — 1 < f^] which shows that 

Vp^d-i,d-i £ ^p,L5J,rfl ^"^^ Vp^d-i^-i £ Pp,pn^^L„j (6.25) 
where we write A C B for a submatrix A of a matrix B which is strictly smaller than B. 
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Example 6.9. Let n = 3,d = 2 and S = {0, 1}. Hence [§] = 2 and [f J = 1 such that 
we have 





p(0) 




p{0) 


p{00) 


p(10) 


p{l) 


p{01) 


p(ll) 


p(00) 


p(OOO) 


p(lOO) 


p(01) 


p(OOl) 


p(lOl) 


p(10) 


p{OW) 


p(llO) 




p(Oll) 


p(lll)/ 



and 



3x7 



^p(e) p(0) p(00) p(01) p(10) p{ll) \ 

p{0) p{00) p{lO) p(OOO) p(OlO) p(lOO) p(llO) € C 
ypll) p{Ol) pill) p(OOl) p(Oll) p(lOl) p(lll)/ 

/p(e) p(0) p{l) \ 
Vp,d-i,d-i = = P{0) p(00) p(10) G C3x3. 

Vp(i) p(01) 

We recall the relationship p{v) = Ylw€T,'^-\^\ p{vw) (6.1), that is, for example, 

p(00) = p(OOO) +p(001) 
p{l) = p(lOO) +p(101) +p(110) +p(lll) 
p{e) = p(OOO) +19(001) + ... +p(110) +p(lll) 

which serves to have expressions in strings of length n = 3 only. Due to that Vp,d~i,d~i is 
a submatrix of both "Pp and "Pp |-^-| |^n:j we can decompose (6.19) into 

rkPp,i,i > 2 (6.26) 
rk Pp,i,2 < 2 and (6.27) 
rkPp,2,i < 2. (6.28) 



Proof. [Proof of lemma 6.8] (i) =^ (ii): Let (p(i)))^g£" be in the image of iMd,n-, but 
not in the image of f/Hd-i,"- Theorem 6.5 in combination with (6.11) reveal that 

d = rk p =^''i-^ rk ^ rk (6.29) 

where the second equation is just the definition of the rank of a string function. Since 
rk Vp^d~i,d~i < I'k "Pp i^^j , rk Pp < rk Pp (see (6.25)), we obtain the claim. 

(ii) (i): Let P := {p{u))u£T,'^ G C^" such that (6.19) applies, rk Vp^d-i^-i = d 
implies P ^Md-i,n'- a^Y submatrix in Vp would have rank at most d — 1. In order to 
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show that P € Im fx^,n we will demonstrate that determining V,x,y, {Ta)aeT, according 
to (6.20), (6.21), (6.22), (6.23) yields 

p{u = ai...a„) = x'Ta-,...Ta,^y (6.30) 
{Y,Ta)v = y. (6.31) 

Applying (ii) =^ (iii) in theorem 6.5 to x, y and the Ta then proves the claim. 

The proof concludes by usage of the following two elementary sub lemmata 6.10, 6.11. 

Lemma 6.10. Let v = ai...am G S* such that \v\ = m < \^~\. Let T^, = Ta^ ■ ... ■ Ta^. 
Then 

x'T^ = {p{vwi),...,p{vwd)). (6.32) 

Proof. [Proof of Lemma 6.10] By induction on |f |, we immediately obtain a proof by 
showing 

{p{vwi), ...,p{vwd))Ta = {p{vawi), ...,p{vawd)) (6.33) 

for all V G S*,a S S with \v\ < ^. Therefore, note first that \wj\ < d — I < ^ implies 
\awj\ < [fl- Hence both |ua|,|ai/;j| < \^~\. Furthermore, \v\ < n/2 implies \v\ < [^J 
and ik Vp^d-i^-i = 'Pp.[S:j.|"2] from (6.19) implies that the v-row {Vp)v in "Pp^^nj^:"-! is 
contained in the span of the rows {Vp)vi, by choice of the Vi (6.20). Accordingly, we 
determine ai,i = 1, ...,d such that 

d d 

i'Pp)!' = ''^^Oii{'Pp)vi which translates to p{vw) = aip{viw) (6.34) 

i=l 1=1 

for all w, \w\ < by definition of Pp^^nj^n-j. As \awj\ < ^ for all j = 1, ...,d, we obtain 

d 

{p{vawi),...,p{vawd)) = ^^ai{p{viawi), ...,p{viawd))- (6.35) 

i=l 

This is the key insight. We finally compute 

d 

{p{vwi), ...,p{vWd))Ta = '^ai{p{ViWi), ...,p{ViWd))Ta 

1=1 

d d 
= ^ai{p{viWi), ...,p{viWd))V~^Wa = ^aie'jWa (6.36) 

i=l i=l 

^ (6 35) 

= '^ai{p{viawi),...,p{viawd)) '= {p{vawi), ...,p{vawd)). 

1=1 

□ 
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Lemma 6.11. For all v,w € S* such that \v\ < < [^J \vw\ < n) (T^ = 

Tai...Ta^,w = ai...ak £ TI"): 

{p{vwi), ...,p{vwd))Tu,y = p{vw). (6.37) 

Proof. [Proof of Lemma 6.11] We do this by induction on \w\, starting with \w\ = 0, 
that IS w = e and = V~^We = Id. Due to rk Vp^d-i,d-i = rk "Ppj^i^i^u , by (6.19), 
the row {Vp)v in j^^j is contained in the span of the rows {Vp)vi, by choice of the vi 

(6.20). Therefore, it suffices to show the statement for v = Vi where we write Vi for the 
i-th row of V: 



{p{viWi),...,p{viWd))T^y = ViV ^ 



fp{vi)\ 


= 


^p{vi)\ 


\Pivd)/ 




\P{Vd)/ 



p{vi). (6.38) 



For the step \w\ \w\ + 1, let w = aw with a E E. Note that, by arguments which are 
analogous as for the start |tt;| = above, it suffices to consider v = vi referring to one 
of the row space generators {Vp)v (while the induction hypothesis already holds for all 

(j){viWi), ...,p{viWd))T^y = ViTaTyjy = ViV~^WaTy,y 

(*) (6-39) 
= e[WaTy,y = {p{viawi),...,p{viawd))Ty,y = p{viaw) = p{viw) 

where (*) is the induction hypothesis with v = v^a (note that \via\ < d < \^~\). □ 

Proof of lemma 6.8 cont. Let u € such that \u\ < n. Split u = vw into two strings 
v,w such that \v\ < [^], \w\ < [^J. We compute 

x'Tuy = x'TyTy^y ^^■^■^^= {p{vwi), ...,p{vwd))Tu,y ^^-^-^^ p{vw) = p{u). (6-40) 
This yields (6.30). To show (6.31) we compute 



{p{viWi),...,p{viWd))^Tay = '^{p{viWi),...,p{viWd))Tay 

(5L.6.11) , ^ , . {SL.6.11) . , . / 

= 2^P{via) = p{vi) = {p{viWi),...,p{viWd))y 



(6.41) 



which yields the claim since span{(p(uji(;i), ...,p{viWd)) \ i = 1, d} = C^. □ 

The step from the set-theoretic lemma 6.8 to proving our ideal-theoretic theorem 6.7 
now follows from standard algebraic arguments, as e.g. listed in [10]. In the following, A 
denotes the Zariski closure of a set A, which is the smallest affine algebraic variety which 
contains the set A, see [10], sec. 4.4, def. 2. 
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In the following, we use 

Fd := Im fMd,n 
as a simpler notation for the image of fM^,n- 

Proof of theorem 6. 7: We first compute 



Vm^u = Fd = Fd\ Fd-i U Fd-i \ Fa-2 U ... U Fi \ Fq U Fq 

— (o-42) 



Fd\Fd-i U Fd-i\Fd-2 U ... U Fi\Fo U Fc 







where the last equation is an obvious consequence of the definition of the Zariski closure 
where one notes that Zariski closure agrees with the topological closure if the latter one 
already is a variety. The irreducibility of Vm^^u implies, by definition of irreducibility. 



3e = l,...,d: yM„n = Fe\Fe_i or VM„n = FQ. (6.43) 
As Fq is just one point, dimFo = and from theorem 5.2 we know that 

^ fl e = 1 

e2(|S|-l) + e e = 2,...,(i 



dimFe\Fe_i < Fe = <; ^ (6.44) 



As, by theorem 5.2, diml/vi^^.„ = (|S| — l)d + d, (6.43) and (6.44) together imply 



VM,,n = Fd\Fd-i. (6.45) 
By (6.25), it follows that (6.19) is equivalent to 

rk Pp^LfJ,rfl' ^p.rfl.LfJ - ^'^'^ 'PpA-iA-i ^ (6.46) 
Application of lemma 6.8 reveals that 

Fd \ Fd^i = Ad+i,n \ Bd (6.47) 

where 

Ad+l,n ■= {{piv))^^^n \ det{p{uiVj))i<ij<d+i = 0, 

V < \ui\, \vj\ < [-], \uiVj\ < n} 

Bd ■= {{p{v))v€J:" I det {p{uiVj))i<ij<d = 0, 

V < < d — 1} 

since Ad+i^n consists of all p such that all {d + l)-minors in |-|-| and Ppja-i |^2.j are 

zero whereas Bd encompasses all p such that not all d-minors in Vp^d-i,d-i are zero. 

As zero sets of determinantal, hence polynomial equations, both Ad+i^n and Bd are 
varieties, and recalling the definition (6.16) of Id+i,n and Jd, we can conclude that these 
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are just the ideals associated with Ad+i^n and Ba- By Hilbert's Nullstehensatz (see [10, 
p. 174, theorem 6]): 

-f(^d+i,n) = rad Id+i,n and I{Bd) = rad J^. (6.48) 

The claim of theorem 6.7 now follows from the interrelationship between quotients of ide- 
als and differences of varieties, as explicitly expressed by plugging rad Id+i,n and Jd into 
/ and J of the second statement of [10, p. 192, th. 7] (note that k there becomes the 
algebraically closed C here). □ 



7. Algorithm 

Let S := {a,b} be a binary-valued alphabet. The following algorithm determines 
whether a probability distribution P : — )• [0, 1] is due to a HMP on at most d* < 
hidden states, as supported by 

Theorem 7.1. Algorithm 7.2 below correctly decides and infers a HMP parametrization 
with at most d hidden states for all but a lower- dimensional subvariety in parameter space 
T-Ld,+ hence establishes a generic solution for problem 1.2. 

See below for a proof and also the supporting lemma 7.3 for further explanations. 

Algorithm 7.2. 

1dentifyHMP(P = (p(v))^gE") 
e ^ 1 

while e<d:= [^J do 

if rk Vp,e-i,e~i = rk Pp^Lf J,rfl = ^p.Tfl.Lf J = ^ ^^^^ 
Ta,Tb,x ^ InferFinitaryParam(P, e) 
if det [Ta + Tb] > and Ta[Ta + Th\~^ is diagonalizable 
such that all eigenvalues are different then 

6: M, Oa, Ob, vr ^ lNFERHMMPARAM(ra, U, x) 

7: if {M,0a,0b,7r) is stochastic then 

8: print 'HMP on e hidden states' 

9: return AI,0a,0b,TT as parametrization 

10; else 

11: print 'No HMP on d hidden states' 

12: return 

13: end if 

14: end if 

15: end if 

16: e ^ e + 1 

17: end while 

18: print 'No HMP on d hidden states' 



2: 
3: 
4: 
5: 
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InferFinitaryParam(P, e) is a routine which computes a e-dimensional 
parametrization {Ta,Ti„x) € Me for a finitary process. It works by computing Ta,Tb 
and X according to (6.20,6.21,6.22,6.23) and subsequent apphcation of theorem 6.5 (note 
that any invertible S such that S~^y = 1 apphes). According to lemma 6.8 this apphes 
in case of 

rk Vp,e-i,e-i = rk 7^p,[f jjf] = rk •Pp,[s],[f j = e 
which is guaranteed by step 3. 

lNFERHMMPARAM(T<j,rfe,x) works if [Ta + Th] is invertible and Ta[Ta + Th]''^ is di- 
agonalizable such that all eigenvalues Ai,...,Ae are different, by lemma 5.3. In this case, 
one chooses S G C^^^ such that SI = 1 [note that this is possible] and 

S-^Ta[Ta + nr^S = diag (Ai, Ae). 

One then computes 

M = s-^[Ta + n]s, 

Oa,Ob = TaM~\nM~^ 
TT = S'x. 

The PROOF OF THEOREM 7.1 is based on the following lemma for which we recall the 
definition of Md, see (5.7). 

Lemma 7.3. Algorithm 7.2 can decide incorrectly only if 

P G fw„n(A4) 

for some e = 1, ...,d in which case it may mistakenly output 'No HMP on at most d hidden 
states 

With lemma 7.3 a proof of theorem 7.1 becomes easy: 

Proof. [Proof of Theorem 7.1] By lemma 5.3, Md forms a lower-dimensional variety in 
T-Ld and further, by corollary 5.5, J\fd H T-Ld,-\- also forms a lower-dimensional semialgebraic 
set in 'Hd,+- D 

We finalize: 

Proof. [Proof of Lemma 7.3] Let € T-Ld,+ and 

P = fH„n(e) 

such that P is incorrectly classified as 'No HMP' by algorithm 7.2. We have to show that 

6 G TVe for some e = 1, d. 
We recall the fundamental relationship (see propositions 4.3,4.6) 

f-He,n(^e,+ ) C Im f^^,„ C Im (7.1) 
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In terms of (7.1), algorithm 7.2 tests for membership from right to left in the e-th iter- 
ation of the while loop, thereby stepwise approving or rejecting that P G iy_^^ni'He,-\-) [C 
fHd,n(^d,+)]- First, by lemma 6.8, step 3 tests for 

P€ (ImfA^,,„\ImfM^_,,„). (7.2) 

Note that the case P € Im fMe-i,n was excluded in the iteration before. This allows to 
infer an e-dimensional parametrization for the respective finitary process in step 6 (see 
the description of InferFinitaryParam above). The if condition in step 7 finally is the 
critical point; it determines whether 

Pefw„n(^e,+ \A4) (7.3) 

see the description of InferHMMParam. If not, the algorithm issues the output 'No 
HMP' which can be mistakenly due to either P G f-^^^„('He,+ nA/'e) C fHe,n(A/'e) or correctly 
due to either P € fHe,n(A/'e \ 'He,+) or P G Im iMc,n \ Im f-He.n]- 

By lemma 5.3, the parameters inferred in step 7 are unique, up to permutations of 
rows and columns. Therefore, steps 8 and 11 decide correctly. □ 
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